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Abstract 

A coloring of a tree is convex if the vertices that pertain to any color induce a connected subtree; 
a partial coloring (which assigns colors to some of the vertices) is convex if it can be completed to a 
convex (total) coloring. Convex coloring of trees arise in areas such as phylogenetics, linguistics, etc. 
eg, a perfect phylogenetic tree is one in which the states of each character induce a convex coloring of 
the tree. Research on perfect phylogeny is usually focused on finding a tree so that few predetermined 
partial colorings of its vertices are convex. 

When a coloring of a tree is not convex, it is desirable to know "how far" it is from a convex one. In 
|19| . a natural measure for this distance, called the recoloring distance was defined: the minimal number 
of color changes at the vertices needed to make the coloring convex. This can be viewed as minimizing 
the number of "exceptional vertices" w.r.t. to a closest convex coloring. The problem was proved to be 
NP-hard even for colored string. 

In this paper we continue the work of and present a 2-approximation algorithm of convex recoloring 
of strings whose running time O(cn), where c is the number of colors and n is the size of the input, and 
an 0(cn 2 )-time 3-approximation algorithm for convex recoloring of trees. 
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A phylogenetic tree is a tree which represents the course of evolution for a given set of species. The 
leaves of the tree are labelled with the given species. Internal vertices correspond to hypothesized, extinct 
species. A character is a biological attribute shared among all the species under consideration, although 
every species may exhibit a different character state. Mathematically, if X is the set of species under 
consideration, a character on X is a function C from X into a set C of character states. A character on a 
set of species can be viewed as a coloring of the species, where each color represents one of the character's 
states. A natural biological constraint is that the reconstructed phylogeny have the property that each 
of the characters could have evolved without reverse or convergent transitions: In a reverse transition 
some species regains a character state of some old ancestor whilst its direct ancestor has lost this state. 
A convergent transition occurs if two species possess the same character state, while their least common 
ancestor possesses a different state. 

In graph theoretic terms, the lack of reverse and convergent transitions means that the character is 
convex on the tree: for each state of this character, all species (extant and extinct) possessing that state 
induce a single block, which is a maximal monochromatic subtree. Thus, the above discussion implies that 
in a phylogenetic tree, each character is likely to be convex or "almost convex". This make convexity a 
fundamental property in the context of phylogenetic trees to which a lot of research has been dedicated 
throughout the years. The Perfect Phylogeny (PP) problem, whose complexity was extensively studied 
(e.g. EE 111 QZ| Q ESI); seeks for a phylogenetic tree that is simultaneously convex on each of the 
input characters. Maximum parsimony (MP) |10| 121 j is a very popular tree reconstruction method that 
seeks for a tree which minimizes the parsimony score defined as the number of mutated edges summed 
over all characters (therefore, PP is a special case of MP). ^21 introduce another criterion to estimate the 
distance of a phylogeny from convexity. They define the phylogenetic number as the maximum number 
of connected components a single state induces on the given phylogeny (obviously, phylogenetic number 
one corresponds to a perfect phylogeny). Convexity is a desired property in other areas of classification, 
beside phylogenetics. For instance, in [HI [3] a method called TNoM is used to classify genes, based on data 
from gene expression extracted from two types of tumor tissues. The method finds a separator on a binary 
vector, which minimizes the number of "1" in one side and "0" in the other, and thus defines a convex 
vector of minimum Hamming distance to the given binary vector. In |14j . distance from convexity is 
used (although not explicitly) to show strong connection between strains of Tuberculosis and their human 
carriers. 

In a previous work ^Hl, we defined and studied a natural distance from a given coloring to a convex 
one: the recoloring distance. In the simplest, unweighted model, this distance is the minimum number of 
color changes at the vertices needed to make the given coloring convex (for strings this reduces to Hamming 
distance from a closest convex coloring). This model was extended to a weighted model, where changing 
the color of a vertex v costs a nonnegative weight w(v). The most general model studied in |19j is the 
non-uniform model, where the cost of coloring vertex v by a color d is an arbitrary nonnegative number 
cost(v, d). 

It was shown in that finding the recoloring distance in the unweighted model is NP-hard even 
for strings (trees with two leaves), and few dynamic programming algorithms for exact solutions of few 
variants of the problem were presented. 

In this work we present two polynomial time, constant ratio approximation algorithms, one for strings 
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and one for trees. Both algorithms are for the weighted (uniform) model. The algorithm for strings is 
based on a lower bound technique which assigns penalties to colored trees. The penalties can be computed 
in 0(cn) time, and once a penalty is computed, a recoloring whose cost is smaller than the penalty is 
computed in linear time. The 2-approximation follows by showing that for a string, the penalty is at most 
twice the cost of an optimal convex recoloring. This last result does not hold for trees, where a different 
technique is used. The algorithm for trees is based on a recursive construction that uses a variant of the 
local ratio technique [31II1> which allows adjustments of the underlying tree topology during the recursive 
process. 

The rest of the paper is organized as follows. In the next section we present the notations and define the 
models used. In Section|3]we define the notion of penalty which provides lower bounds on the optimal cost 
of convex recoloring of any tree. In Section 01 we present the 2-approximation algorithm for the string. 
In Section El we briefly explain the local ratio technique, and present the 3-approximation algorithm for 
the tree. We conclude and point out future research directions in Section E3 

2 Preliminaries 

A colored tree is a pair (T,C) where T = (V, E) is a tree with vertex set V = {v%, . . . ,v n }, and C is a 
coloring of T, i.e. - a function from V onto a set of colors C. For a set U C V, C\u denotes the restriction 
of C to the vertices of U, and C(U) denotes the set {C(u) :u£U}. For a subtree V = (V(T>), E(T')) of 
T, C(T') denotes the set C(V(T')). A block in a colored tree is a maximal set of vertices which induces a 
monochromatic subtree. A d-block is a block of color d. The number of (i-blocks is denoted by n^^C, d), or 
nb(d) when C is clear from the context. A coloring C is said to be convex if nb(C,d) = 1 for every color 
d G C. The number of d-violations in the coloring C is nb(C,d) — 1, and the total number of violations of 
C is Z~2cec( n b(^'i d) — 1). Thus a coloring C is convex iff the total number of violations of C is zero (in [0] 
the above sum, taken over all characters, is used as a measure of the distance of a given phylogenetic tree 
from perfect phylogeny). 

The definition of convex coloring is extended to partially colored trees, in which the coloring C assigns 
colors to some subset of vertices U C V, which is denoted by Domain(C). A partial coloring is said to be 
convex if it can be extended to a total convex coloring (see |22j). Convexity of partial and total coloring 
have simple characterization by the concept of carriers: For a subset U of V, carrier(U) is the minimal 
subtree that contains U. For a colored tree (T, C) and a color d G C, carrierT{C, d) (or carrier (C, d) when 
T is clear) is the carrier of C~ l {d). We say that C has the disjointness property if for each pair of colors 
{d, d'} it holds that carrier(C, d) D carrier(C, d') = 0. It is easy to see that a total or partial coloring C is 
convex iff it has the disjointness property (in 8 j convexity is actually defined by the disjointness property). 

When some (total or partial) input coloring (C, T) is given, any other coloring C of T is viewed as a 
recoloring of the input coloring C. We say that a recoloring C of C retains (the color of) a vertex v if 
C(v) = C'(v), otherwise C overwrites v. Specifically, a recoloring C of C overwrites a vertex v either by 
changing the color of v, or just by uncoloring v. We say that C retains (overwrites) a set of verices U if 
it retains (overwrites resp.) every vertex in U. For a recoloring C of an input coloring C, Xc{C) (or just 
X(C')) is the set of the vertices overwritten by C", i.e. 

X C (C) = {v G V : [v € Domain{C)\ J\ [(v $ Domain(C') ) V (C(v) ^ C'(v) )] }. 

With each recoloring C of C we associate a cost, denoted as costc(C) (or cost(C) when C is under- 
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stood), which is the number of vertices overwritten by C, i.e. costc(C) = \Xc(C')\. A coloring C* is 
an optimal convex recoloring of C, or in short an optimal recoloring of C, and costc(C*) is denoted by 
OPT(T,C), if C* is a convex coloring of T, and costc(C*) < costc{C) for any other convex coloring C 
of T. 

The above cost function naturally generalizes to the weighted version: the input is a triplet (T,C,w), 
where w : V — > R + U {0} is a weight function which assigns to each vertex v a nonnegative weight w(v). For 
a set of vertices X, w(X) = X^ex w(v). The cost of a convex recoloring C of C is costc(C) = w(X{C')), 
and C is an optimal convex recoloring if it minimizes this cost. 

The above unweighted and weighted cost models are uniform, in the sense that the cost of a recoloring 
is determined by the set of overwritten vertices, regardless the specific colors involved. ^Hj defines also a 
more subtle non uniform model, which is not studied in this paper. 

Let AL be an algorithm which receives as an input a weighted colored tree (T, C, w) and outputs a 
convex recoloring of (T, C, u>), and let AL(T,C,w) be the cost of the convex recoloring output by AL. 
We say that AL is an r- approximation algorithm for the convex tree recoloring problem if for all inputs 
(T,C,w) it holds that AL(T,C,w)/OPT(T,C,w) < r HUHSJ. 

We complete this section with a definition and a simple observation which will be useful in the sequel. 
Let (T, C) be a colored tree. A coloring C* is an expanding recoloring of C if in each block of C* at least 
one vertex v is retained (i.e., C{v) = C*{v)). 

Observation 2.1 let (T = (V,E),C,w) be a weighted colored tree, where w(V) > 0. Then there exists an 
expanding optimal convex recoloring of C . 

Proof. Let C be an optimal recoloring of C which uses a minimum number of colors (i.e. |C"(V)| is 
minimized). We shall prove that C' is an expanding recoloring of C. 

Since w(V) > 0, the claim is trivial if C uses just one color. So assume for contradiction that C' uses 
at least two colors, and that for some color d used by C, there is no vertex v s.t. C{v) = C'(v) = d. 
Then there must be an edge (u, v) such that C'(u) = d but C'(v) = d' ^ d. Therefore, in the uniform cost 
model, the coloring C" which is identical to C except that all vertices colored d are now colored by d' is 
an optimal recoloring of C which uses a smaller number of colors - a contradiction. ■ 

In view of Observation ^. II above, we assume in the sequel (sometimes implicitly) that the given optimal 
convex recolorings are expanding. 

3 Lower Bounds via Penalties 

In this section we present a general lower bound on the recoloring distance of weighted colored trees. 
Although for a general tree this bound can be fairly poor, in the next section we show that for strings it is 
at least half the optimal cost, and then we use this fact to obtain a 2- approximation algorithm for strings. 
Let (T, C, w) be a weighted colored tree. For a color d and U C V(T) let: 

penalty c ,d{U) = w{U n C _1 (d)) + w(U n C~ l {d)) 

Informally, when the vertices in U induce a subtree, penaltyc,d(U) is the total weight of the vertices which 
must be overwritten to make U the unique d-block in the coloring: a vertex v must be overwritten either 
if v 6 U and C(v) ^ d, or if v £ U and C(v) = d. 
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Figure 1: C is a convex recoloring for C which defines the following penalties: p gr een(C) = 1, PrediP') = 2, 

Pblue(C) = 3 

The penalty of a given convex recoloring is sums of the penalties of every colored block: Let C 1 be a 
convex recoloring of C. Then: 

penalty c (C) = ^^penalty c ,d{C'^ 1 {d)) 
dec 

Figure H depicts the calculation of a penalty associated with a convex recoloring C of C. 

In the sequel we assume that the input colored tree (T, C) is fixed, and omit it from the notations. 

Claim 3.1 penalty (C) = 2cost{C) 

Proof. From the definitions we have 

penalty(C') = w ({v € V : C'(v) = d and C(v) + d} U {v € V : C» / d and = d}) 
dec 

= 2w({i; e V : C» ^ C(v)}) = 2cost{C) 

■ 

As can be seen in Figure ^ penalty (C) = 6 while cost{C) = 3. 

For each color d, is the penalty of a block which minimizes the penalty for d: 

p* d = min{penaltyd(V \T )) : T is a subtree of T} 
Corollary 3.2 For any recoloring C* of C , 

5>d < J>ena/^(C") = 2cost(C). 

dec dec 

Proof. The inequality follows from the definition of p d , and the equality from Claim |3~T1 ■ 

Corollary 13.21 above provides a lower bound on the cost of convex recoloring of trees. It can be shown 
that this lower bound can be quite poor for trees, that is: OPT(T, C) can be considerably larger than 
(^2decPd)/2- For example, any convex recoloring of the tree in Figure 12 will recolor at least one of the 
big lateral blocks in the tree, while {^2decPd)/^ ^ n * na ^ ^ ree * s ^ ne weight of the (small) central vertex (the 
circle). However in the next section we show that this bound can be used to obtain a polynomial time 
2-approximation for convex recoloring of strings. 
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Figure 3: The upper part of the figure shows the optimal blocks on the string and the lower part shows 
the coloring returned by the algorithm. 

4 A 2-Approximation Algorithm for Strings 

Let a weighted colored string (S,C,w), where S = (vi, . . . ,v n ), be given. For 1 < i < j < n, S[i,j] is the 
substring {vi,Vi + \, . . . ,Vj) of S. The algorithm starts by finding for each d a substring Bd = S[id,jd] for 
which penalty d(S [id, jd]) = P*d- It is n °t hard to verify that Bd consists of a subsequence of consecutive 
vertices in which the difference between the total weight of d- vertices and the total weight of other vertices 
(i.e. w(Bd fl C (d)) — w(Bd \ C^ 1 (d))) is maximized, and thus Bd can be found in linear time. We say 
that a vertex v is covered by color d if it belongs to Bd- v is covered if it is covered by some color d, and it 
is free otherwise. 

We describe below a linear time algorithm which, given the blocks Bd, defines a convex coloring C so 
that cost(C) < YudP% w bich by Corollary 13.21 is a 2-approximation to a minimal convex recoloring of C. 

C is constructed by performing one scan of S from left to right. The scan consists of at most c stages, 
where stage j defines the j — th block of C , to be denoted Fj, and its color, dj, as follows. 

Let d\ be the color of the leftmost covered vertex (note that v\ is either free or covered by d\). d\ is 
taken to be the color of the first (leftmost) block of C, F\, and C{v\) is set to d\. For i > 1, C(vi) is 
determined as follows: Let C{vi-\) = dj. Then if Vi £ or V{ is free, then C{v{) is also set to dj. Else, 
Vi must be a covered vertex. Let dj + \ be one of the colors that cover V{. C{vi) is set to dj + \ (and Vi is the 
first vertex in Fj + \). 

Observation 4.1 C is a convex coloring of S. 

Proof. Let dj be the color of the j — th block of C, Fj, as described above. The convexity of C follows 
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from the the following invariant, which is easily proved by induction: For all j > 1, LP k=1 Fk 3 ^k=l^d h - 
This means that, for all j, no vertex to the right of Fj is covered by dj, and hence no such vertex is colored 
by dj. The observation follows. ■ 

Thus it remains to prove 
Lemma 1 cost(C) < Yld&cPd- 

Proof. Let Vi be a vertex which contributes to cost(C). Then C{vi) = d and C{vi) = d' for some distinct 
d', d. By the algorithm, either V{ £ Bd>, or vi is free. In the first case Uj contributes to both p* d and p d ,, and 
in the 2nd it contributes to p* d . The inequality is strict since in each block Fj there is at least one vertex 
for which the former case holds. ■ 



5 A 3- Approximation Algorithm for Tree 

In this section we present a polynomial time algorithm which approximates the minimal convex coloring of 
a weighted tree by factor three. The input is a triplet (T, C, w), where w is a nonnegative weight function 
and C is a (possibly partial) coloring whose domain is the set support{w) = {v £ V : w(v) > 0}. 

We firat introduce the notion of covers w.r.t. colored trees. A set of vertices X is a convex cover (or 
just a cover) for a colored tree (T,C) if the (partial) coloring Cx = C|[y\x] is convex (i.e., C can be 
transformed to a convex coloring by overwriting the vertices in X). Thus, if C is a convex recoloring 
of (T, C), then Xc(C), the set of vertices overwritten by C, is a cover for (T,C). Moreover, deciding 
whether a subset X C V is a cover for (T, C), and constructing a total convex recoloring C of C such 
that X{C') C Jin case it is, can be done in 0(n ■ n c ) time. Also, the cost of a recoloring C is w(X(C')). 
Therefore, finding an optimal convex total recoloring of C is polynomially equivalent to finding an optimal 
cover X, or equivalently a partial convex recoloring C of C so that w{X{C')) = w(X) is minimized. 

Our approximation algorithm makes use of the local ratio technique, which is useful for approximating 
optimization covering problems such as vertex cover, dominating set, minimum spanning tree, feedback 
vertex set and more [31 121 E| • We hereafter describe it briefly: 

The input to the problem is a triplet (V, £ C 2 V , w : V —> M + ), and the goal is to find a subset IgS such 
that w(X) is minimized, i.e. w(X) = OPTiV^^w) = m\riw(Y) (in our context V is the set of vertices, 

and £ is the set of covers). The local ratio principle is based on the following observation (see e.g. 0): 

Observation 5.1 For every two weight functions wi,W2: 

OPT(V, S, wi) + OPT(V, S, w 2 ) < OPT(V, £, w x + w 2 ) 

Now, given our initial weight function to, we select wi,vi2 s.t. W1+W2 = w and \supprt(w\)\ < \support{w)\. 
We first apply the algorithm to find an r-approximation to (V,£,u>i) (in particular, if V \ support(wi) 
is a cover, then it is an optimal cover to (V, £,u>i)). Let X be the solution returned for (V, £,w;i), and 
assume that wi(X) < r ■ OPTiV, £, w\). If we could also guarantee that W2{X) < r ■ OPT(V, £, W2) then 
by Observation 15.11 we are guaranteed that X is also an r-approximation for (V, T,,w\ + W2 = w). The 
original property, introduced in 4], which was used to guarantee that W2(X) < r ■ OPT(V,T l ,W2) is that 
W2 is r-effective, that is: for every X E £ it holds that W2(X) < r ■ OPT(V, £, W2) (note that if V 6 £, the 
above is equivalent to requiring that W2(V) < r ■ OPT(V, £, ^2)). 
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Theorem 5.2 gj/ Given X G £ s.i. ioi(X) < r • OPT(V, E, u>i). If w 2 is r-effective, then w(X) = 
wi(X)+w 2 (X) <r-OPT(V,Z,w). 



We start by presenting two applications of Theorem 15.21 to obtain a 3-approximation algorithm for 
convex recooloring of strings and a 4-approximation algorithm for convex recoloring of trees. 

3-string-APPROX: 

Given an instance to convex weighted string problem (S,C,w): 

1. If V \ support(w) is a cover then X <— V \ support(w). Else: 

2. Find 3 vertices x,y,z E support(w) s.t. C{x) = C{z) ^ C(y) and y lies between x and z. 

(a) e mm{w(x),w(y),w(z)} 



Note that if a (partial) coloring of a string is not convex then the condition in |2] must hold. It is also 
easy to see that w 2 is 3-effective, since any cover Y must contain at least one vertex from any triplet 
described in condition O hence W2(Y) > e while W2(V) = 3e. 

The above algorithm cannot serve for approximating convex tree coloring since in a tree the condition 
in [21 might not hold even if V \ support(w) is not a cover. In the following algorithm we generalize this 
condition to one which must hold in any non-convex coloring of a tree, in the price of increasing the 
approximation ratio from 3 to 4. 

4-tree-APPROX: 

Given an instance to convex weighted tree problem (T, C, w): 

1. If V \ support(w) is a cover then X <— V \ support(w). Else: 

2. Find two pairs of (not necessarily distinct) vertices (xi,X2) and (2/1,2/2) m support(w) s.t. 
C(x\) = C(x 2 ) ^ (yi) = C(y 2 ), and carrier ({xi,x 2 }) n carrier ({yi,y 2 }) 7^ 0: 

(a) e <- mm{w(xi),w(yi)}, i = {1, 2} 



The algorithm is correct since if there are no two pairs as described in step 12 then V \ support{w) is a 
cover. Also, it is easy to see that w 2 is 4-effective. Hence the above algorithm returns a cover with weight 
at most 4 • OPT(T, C,w). 

We now describe algorithm 3-tree-APPROX. Informally, the algorithm uses an iterative method, in the 
spirit of the local ratio technique, which approximates the solution of the input (T, C, w) by reducing it 




(c) <— w — w 2 

(d) X «- 3-string-APPROX(5,C| support(u , l) ,ti;i) 




(c) W\ <— w — w 2 

(d) X <- 4-tree-APPROX(5,C| SMpport(M;i) ,u;i) 
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Figure 4: Case 2: a vertex v is contained in 3 different carriers. 

to (T',C',wi) where \support{w\)\ < \ support (w)\. Depending on the given input, this reduction is either 
of the local ratio type (via an appropriate 3-effective weight function) or, the input graph is replaced by a 
smaller one which preserves the optimal solutions. 

3-tree-APPROX(T, C, w) 

On input (T, C, w) of a weighted colored tree, do the following: 

1. If V \ support(w) is a cover then X <— V \ support(w). Else: 

2. (T', C', W\) <- REDUCE(T,C,w). \The function REDUCE guarantees that 
| support (wi) | < | support (w) | 

(a) X' <- 3-tree-APPROX(T',C",y;i). 

(b) X <- UPDATE(X',T). \The function UPDATE guarantees that if X' is a 3- 
approximation to (T 1 , C',wi), then X is a 3-approximation to (T,C,w). 

Next we describe the functions REDUCE and UPDATE, by considering few cases. In the first two 
cases we employ the local ratio technique. 

Case 1: support{w) contains three vertices x,y,z such that y lies on the path from x to z and C(x) = 
C(z)^C(y). 

In this case we use the same reduction of 3-string-APPROX: Let e = mm{w(x),w(y),w(z)} > 0. Then 
REDUCE(T, C, w) = (T, C\ 

support(w\)i w i)i where w\(v) — w(v) if v {x, y, z}, else wi(v) — w(y) — £. The 
same arguments which implies the correctness of 3-string-APPROX implies that if X' is a 3-approximation 
for (T',C',wi), then it is also a 3-approximation for (T,C,w), thus we set UPDATE(X' ,T) = X' . 
Case 2: Not Case 1, and T contains a vertex v such that v 6 nf =1 carrier(di,C) for three distinct colors 
d\ , c?2 and d% (see Figure . 

In this case we must have that w(v) = (else Case 1 would hold), and there are three designated pairs 
of vertices {x±, x%}, {2/1,2/2} and {^1,^2} such that C(xi) = d%, C{yi) = di,C(Zi) = ds(i = 1,2), and u 
lies on each of the three paths connecting these three pairs (see Figure 0J. We set REDUCE(T, C,w) = 
(T,C\ 

support(wi)i w i)i where w\ is defined as follows. 
Let e = mm{w(xi),w(yi),w(zi) : i = 1,2}. Then w\{v) = w{v) if v is not in one of the designated 
pairs, else w±(v) = w(v) — e. Finally, any cover for (T,C) must contain at least two vertices from the 
set {xi,yi,Zi : i = 1,2}, hence w — w\ = W2 is 3-effective, and by the local ratio theorem we can set 
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UPDATE(X',T) = X'. 
Case 3: Not Cases 1 and 2. 

Root T at some vertex r and for each color d let r^ be the root of the subtree carrier(d,C). Let do be a 
color for which the root rd Q is farthest from r. Let T be the subtree of T rooted at rd , and let T = T \T 
(see Figure EJ). By the definition of rrf , no vertex in T is colored by do, and since Case 2 does not hold, 
there is a color <f so that {d } C C(V(T)) C {d ,d'}. 



'J 

[ 


H 1 


1 1 1 

A A 



Figure 5: Case 3: Not case 1 nor 2. T is the subtree rooted at r^ and T = T \ T. 
Subcase 3a: C{V{f)) = {d } (see Figure [BJ. 

In this case, carrier(do,C) n carrier (d,C) = for each color d ^ do, and for each optimal solution X it 
holds that X n V(f) = 0. We set REDUCE(T,C,w) *- (f,C\ v{f) ,w\ v{f) ). The 3-approximation X' to 
(T',C',wi) is also a 3-approximation to (X, C, w), thus UPDATE(X' , T) = X'. 




Figure 6: Case 3a: No vertices of T are colored by d'. 

We are left with the last case. 
Subcase 3b: r^ £ carrier(do,C) n carrier (d! ,C). See Figured 

Observe that in this case we have w^r^) = and |swpport(w) n V(T)| > 3, since V(f ) must contain at 
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Figure 7: Case 3b: r^ G T^ n carrier{d') 

least two vertices colored do and at least one vertex colored d' . Figure illustrates this case. 

Observation 5.3 There is an optimal convex coloring C which satisfies the following: C'{y) ^ do for any 
v G V{f), and C'{v) G {d ,d'} for any v G V(f). 

Proof. Let C be an expanding optimal convex recoloring of (T, C). We will show that there is an optimal 
coloring C satisfying the lemma such that cost{C) < cost(C). Since C is expanding and optimal, at least 
one vertex in T is colored either by do or by d'. Let U be a set of vertices in T so that carrier(U) is a 
maximal subtree all of whose vertices are colored by colors not in {do,d'}. Then carrier(U) must have a 
neighbor u in T s.t. C(u) G {do,d'}. Change the colors of the vertices in U to C(u). This procedure can be 
repeated until all the vertices of T are colored by do or by d' , without increasing the cost of the recoloring. 
A similar procedure can be used to change the color of all the verticed in T to be different from do- It is 
easy to see that the resulting coloring C' is convex and cost(C) < cost{C). ■ 

The function REDUCE in Subcase 3b is based on the following observation: Let C be any optimal 
recoloring of T satisfying Observation 15 . M[ and let s be the parent of r^ in T. Then C\ytf\, the restriction 
of the coloring C to the vertices of T, depends only on whether carrier (d' ,C) intersects V(T), and 
in this case if it contains the vertex s. Specifically, Cy/y\ must be one of the three colorings of V(T), 
Chigh,C me( num & n d C m i n , according to the following three scenarios: 

1. carrier (d' , C") n V(T) ^ and s £ carrier {dl ,C). Then it must be the case that C colors all the 
vertices in V(T) by do- This coloring of T is denoted as Chigh- 

2. carrier (d! i C) n V(T) ^ and s G carrier (d' , C). Then C'\f is a coloring of minimal possible cost 
of T which either equals C^gh (i.e. colors all vertices by do), or otherwise colors r^ by d' . This 
coloring of f is called C me di um . 

3. carrier{d' , C) n V(T) = 0. Then C'\f must be an optimal convex recoloring of T by the two colors 
do, d' . This coloring of T is called C m j n . 

We will show soon that the colorings Chighi C roe dram an d C m i n above can be computed in linear 
time. The function REDUCE in Subcase 3b modifies the tree T by replacing T by a subtree To 
with only 2 vertices, r^ and vo, which encodes the three colorings Chigh, C me< ii um , C m i n . Specifically, 
REDUCE(T,C,w) = (T', C , w{) where (see FigureEJ): 
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• T' is obtained from T by replacing the subtree T by the subtree To which contains two vertices: a 
root rd with a single descendant vo. 

• wi(v) = w{v) for each v G V(T). For r^ and vq, w\ is defined as follows: w\{rd Q ) = cost{C me dium) — 
cost(C min ) and W\{vq) = cost(C high ) - cost(C min ). 

• C'{v) = C(v) for each v G V{f ); If io(r do ) > 0) then C"(r do ) = d and if w(u ) > then C'(v ) = d! . 
(If w\(u) = for u £ {rd ,vo}, then C"(n) is undefined). 

Figure |H1 illustrates REDUCE for case 3b. In the figure, C^igh requires overwriting all d' vertices and 
therefore costs 3, C me di U m requires overwriting one do vertex and costs 2 and C m i n is the optimal coloring 
for T with cost 1. The new subtree T reflects these weight with ifi(r do ) = C me di U m — Cmin = 1 and 

W\{vq) = Chigh — C m in = 2. 




Figure 8: REDUCE of case 3b: T is replaced with To where iui(r<f ) = C me d% U m — C m i n = 1 and 

Wl(vo) = Chigh — Cmin = 2. 

Claim 5.4 OPT(T' ,C \ Wl ) = OPT(T,C,w) - cost{C min ). 

Proof. We first show that OPT(T' \C \w 1 ) < OPT(T,C,w) - cost(C min ). Let C* be an optimal 
recoloring of C satisfying Observation 15.31 and let X* = X(C*). By the discussion above, we may assume 
that C*\ v( f) has one of the forms C h i g h, C med ium or C min . Thus, X*C\V{T) is either X(C high ), X(C med i um ) 
or X{C min ). We map C* to a coloring C of V as follows: for v G V(f), C'(v) = C*(v). C on r do and t; 
is defined as follows: 

• If C*\ vi r) = Chigh then C'(r do ) = C'(v ) = d , and cost{C'\ v( f ) = w x (vq)\ 

• If C*\ v( f^ = C med ium then C'(r do ) = C'(v ) = df, and cost(C'\ v( f) = wi{r do ); 

• If C*\ V (T) = C min then C'(r do ) = d , C'(y Q ) = d! , and cost(C'\ v( f) = 0. 



Draft 



12 



Note that in all three cases, cost(C') = cost(C*) — cost(C m i n ). 

The proof of the opposite inequality OPT(T, C, w) - cost(C min ) < OPT(T', C, wi) is similar. ■ 

Corollary 5.5 C* is optimal recoloring of (T,C,w) iff C is an optimal recoloring of (T" ,C ,wi). 

We now can define the UPDATE function for Subcase 3b: Let X' =3-tree - APPBOX(T,C',w 1 ). 
Then X' is a disjoint union of the sets X' = X' n V{f) and X' = X' n V(f ). Moreover, X' G 
{{r do }, {v }, 0}- Then X <- UPDATE(X') = X'UX', where X' is X(C high ) i£X' = {r do }, is X(C medium ) 
if X' = {v }, and is X{C min ) if X' = 0. Note that w(X) = w(X') + cost(C min ). The following 
inequalities show that if wi(X') is a 3-approximation to OPT(T' ,C' ,w\), then iw(X) is a 3-approximation 
to OPT(T, C,w): 

w(X) = Wl (X') + cost{C min ) < 30PT(T', C , Wl ) + cost(C min ) 
< 3(OPT(T', C, Wl ) + cost(C min )) = 30PT(T, C, w) 

5.1 A Linear Time Algorithm for Subcase 3b 

In subcase 3b we need to compute Chighi C met n U m an d C m i n . The computation of C^igh is immediate. 
Cmedium an d C m i n can be computed by the following simple, linear time algorithm that finds a minimal 
cost convex recoloring of a bi-colored tree, under the constraint that the color of a given vertex r is 
predetermined to one of the two colors. 

Let the weighted colored tree (T, C, w) and the vertex r be given, and let {di, cfe} = C(T). For i G {1, 2}, let 
Ci the minimal cost convex recoloring which sets the color of r to di (note that a coloring with minimum 
cost in {C\,C2} is an optimal convex recoloring of (T, C)). We illustrate the computation of C\ (the 
computation of C2 is similar): 

Compute for every edge e = (u — > a cost defined by 

cost(e) = «;({*/ : */ G T(u) and C(v') = di}) + w({t/ : 1/ € [T \ T(v)] and C(v') = d 2 }) 

where T(v) is the subtree rooted at v. This can be done by one post order traversal of the tree. Then, 
select the edge e* = (uq — > t>o) which minimizes this cost, and set C\(w) = c?2 for each w G T(vq), and 
Ci(w) = d\ otherwise. 

5.2 Correctness and complexity 

We now summarize the discussion of the previous section to show that the algorithm terminates and return 
a cover X which is a 3-approximation for (T, C, w). 

Let (T = (V, E), C, w) be an input to 3-tree-APPROX. if V \ support{w) is a cover then the returned 
solution is optimal. Else, in each of the cases, REDUCE(T,C,w) reduces the input to (T',C f ,w\) such 
that \support(w\)\ < \support(w)\, hence the algorithm terminates within at most n = \V\ iterations. 
Also, as detailed in the previous subsections, the function U PDATE guarantees that that if X' is a 3- 
approximation for (T',C',Wi) then X is a 3-approximation to (T,C,w). Thus after at most n iterations 
the algorithm provides a 3-approximation to the original input. 

Checking whether Case 1, Case 2, Subcase 3a or Subcase 3b holds at each stage requires O(cn) time 
for each of the cases, and computing the function REDUCE after the relevant case is identified requires 
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linear time in all cases. Since there are at most n iterations, the overall complexity is 0(cn 2 ). Thus we 
have 

Theorem 5.6 Algorithm 3-tree-APPROX is a polynomial time 3- approximation algorithm for the mini- 
mum convex recoloring problem. 

6 Discussion and Future Work 

In this work we showed two approximation algorithms for colored strings and trees, respectively. The 
2-approximation algorithm relies on the technique of penalizing a colored string and the 3-approximation 
algorithm for the tree extends the local ratio technique by allowing dynamic changes in the underlying 
graph. 

Few interesting research directions which suggest themselves are: 

• Can our approximation ratios for strings or trees be improved. 

• This is a more focused variant of the previous item. A problem has a polynomial approximation 
scheme [111 I15j . or is fully approximate (20], if for each e it can be e-approximated in p e (n) time 
for some polynomial p e . Are the problems of optimal convex recoloring of trees or strings fully 
approximable, (or equivalently have a polynomial approximation scheme)? 

• Alternatively, can any of the variant be shown to be APX-hard []? 

• The algorithms presented here apply only to uniform models. The non uniform model, motivated by 
weighted maximum parsimony [2^ , assumes that the cost of assigning color al to vertex v is given by 
an arbitrary nonnegative number cost(v, d) (note that, formally, no initial coloring C is assumed in 
this cost model). In this model cost(C) is defined only for a total recoloring C, and is given by the 
sum Ylvev cos t( v i C'{v)). Finding non-trivial approximation results for this model is challanging. 
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